INDIAN STARTUP ECOSYSTEM¶

 HYPOTHESIS¶

Receiving more funds in india as a startup depends on the location and how long it has lasted¶

Questions:  1.What sectors access most funds as startups in India?
            2.How has funding of startups improved overtime?
            3.Does the location of the startup influence the funding it receives?
            4.Who are the highest investors in the various sectors?The purpose of this question is to help 
             my team identify which investors to poach when we decide on what we want to venture into.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpathes
import seaborn as sns
import warnings
import plotly.graph_objects as go
warnings.filterwarnings('ignore')

IMPORTING DATA¶

In [2]:
##importing data
Data_2018 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2018.csv")
Data_2019 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2019.csv")
Data_2020 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2020.csv")
Data_2021 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2021.csv")

PREVIEWING DATASET¶

In [3]:
Data_2021.head()
Out[3]:
Company/Brand Founded HeadQuarter Sector What it does Founders Investor Amount($) Stage
0 Unbox Robotics 2019.0 Bangalore AI startup Unbox Robotics builds on-demand AI-driven ware... Pramod Ghadge, Shahid Memon BEENEXT, Entrepreneur First $1,200,000 Pre-series A
1 upGrad 2015.0 Mumbai EdTech UpGrad is an online higher education platform. Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,... Unilazer Ventures, IIFL Asset Management $120,000,000 NaN
2 Lead School 2012.0 Mumbai EdTech LEAD School offers technology based school tra... Smita Deorah, Sumeet Mehta GSV Ventures, Westbridge Capital $30,000,000 Series D
3 Bizongo 2015.0 Mumbai B2B E-commerce Bizongo is a business-to-business online marke... Aniket Deb, Ankit Tomar, Sachin Agrawal CDC Group, IDG Capital $51,000,000 Series C
4 FypMoney 2021.0 Gurugram FinTech FypMoney is Digital NEO Bank for Teenagers, em... Kapil Banwari Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal $2,000,000 Seed
In [4]:
Data_2020.head()
Out[4]:
Company/Brand Founded HeadQuarter Sector What it does Founders Investor Amount($) Stage Unnamed: 9
0 Aqgromalin 2019 Chennai AgriTech Cultivating Ideas for Profit Prasanna Manogaran, Bharani C L Angel investors $200,000 NaN NaN
1 Krayonnz 2019 Bangalore EdTech An academy-guardian-scholar centric ecosystem ... Saurabh Dixit, Gurudutt Upadhyay GSF Accelerator $100,000 Pre-seed NaN
2 PadCare Labs 2018 Pune Hygiene management Converting bio-hazardous waste to harmless waste Ajinkya Dhariya Venture Center Undisclosed Pre-seed NaN
3 NCOME 2020 New Delhi Escrow Escrow-as-a-service platform Ritesh Tiwari Venture Catalysts, PointOne Capital $400,000 NaN NaN
4 Gramophone 2016 Indore AgriTech Gramophone is an AgTech platform enabling acce... Ashish Rajan Singh, Harshit Gupta, Nishant Mah... Siana Capital Management, Info Edge $340,000 NaN NaN
In [5]:
Data_2019.head()
Out[5]:
Company/Brand Founded HeadQuarter Sector What it does Founders Investor Amount($) Stage
0 Bombay Shaving NaN NaN Ecommerce Provides a range of male grooming products Shantanu Deshpande Sixth Sense Ventures $6,300,000 NaN
1 Ruangguru 2014.0 Mumbai Edtech A learning platform that provides topic-based ... Adamas Belva Syah Devara, Iman Usman. General Atlantic $150,000,000 Series C
2 Eduisfun NaN Mumbai Edtech It aims to make learning fun via games. Jatin Solanki Deepak Parekh, Amitabh Bachchan, Piyush Pandey $28,000,000 Fresh funding
3 HomeLane 2014.0 Chennai Interior design Provides interior designing solutions Srikanth Iyer, Rama Harinath Evolvence India Fund (EIF), Pidilite Group, FJ... $30,000,000 Series D
4 Nu Genes 2004.0 Telangana AgriTech It is a seed company engaged in production, pr... Narayana Reddy Punyala Innovation in Food and Agriculture (IFA) $6,000,000 NaN
In [6]:
Data_2018.head()
Out[6]:
Company Name Industry Round/Series Amount Location About Company
0 TheCollegeFever Brand Marketing, Event Promotion, Marketing, S... Seed 250000 Bangalore, Karnataka, India TheCollegeFever is a hub for fun, fiesta and f...
1 Happy Cow Dairy Agriculture, Farming Seed ₹40,000,000 Mumbai, Maharashtra, India A startup which aggregates milk from dairy far...
2 MyLoanCare Credit, Financial Services, Lending, Marketplace Series A ₹65,000,000 Gurgaon, Haryana, India Leading Online Loans Marketplace in India
3 PayMe India Financial Services, FinTech Angel 2000000 Noida, Uttar Pradesh, India PayMe India is an innovative FinTech organizat...
4 Eunimart E-Commerce Platforms, Retail, SaaS Seed — Hyderabad, Andhra Pradesh, India Eunimart is a one stop solution for merchants ...

ADD A NEW COLOUMN TO THE DATA SET¶

In [7]:
Data_2021["Year of funding"]= 2021
Data_2020["Year of funding"]= 2020
Data_2019["Year of funding"]= 2019
Data_2018["Year of funding"]= 2018

DROPPING DUPLICATES FROM DATASET¶

In [8]:
Data_2021.drop_duplicates(inplace = True)
Data_2020.drop_duplicates(inplace = True)
Data_2019.drop_duplicates(inplace = True)
Data_2018.drop_duplicates(inplace = True)

CHANGING THE AMOUNT COLUMN NAME FROM 2019 -2021 DATASET¶

In [9]:
Data_2021.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)
Data_2020.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)
Data_2019.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)

STRIPPING DOLLAR SIGN FROM AMOUNT COLUMN¶

In [10]:
Data_2021['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)
Data_2020['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)
Data_2019['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)

REMOVING UNWANTED COLUMNS FROM 2019 - 2021 DATAFRAMES¶

In [11]:
Data_2021.drop(['What it does','Founders','Stage'],axis=1, inplace = True)
Data_2020.drop(['What it does','Founders','Stage'],axis=1, inplace = True)
Data_2019.drop(['What it does','Founders','Stage'],axis=1, inplace = True)

CLEANING 2021 DATAFRAME¶

In [12]:
Data_2021.head()
Out[12]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Unbox Robotics 2019.0 Bangalore AI startup BEENEXT, Entrepreneur First 200000 2021
1 upGrad 2015.0 Mumbai EdTech Unilazer Ventures, IIFL Asset Management 100000 2021
2 Lead School 2012.0 Mumbai EdTech GSV Ventures, Westbridge Capital Undisclosed 2021
3 Bizongo 2015.0 Mumbai B2B E-commerce CDC Group, IDG Capital 400000 2021
4 FypMoney 2021.0 Gurugram FinTech Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal 340000 2021
In [13]:
Data_2021.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1190 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    1190 non-null   object 
 1   Founded          1189 non-null   float64
 2   HeadQuarter      1189 non-null   object 
 3   Sector           1190 non-null   object 
 4   Investor         1129 non-null   object 
 5   Amount(USD)      1030 non-null   object 
 6   Year of funding  1190 non-null   int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 74.4+ KB
In [14]:
Data_2021.isnull().sum()
Out[14]:
Company/Brand        0
Founded              1
HeadQuarter          1
Sector               0
Investor            61
Amount(USD)        160
Year of funding      0
dtype: int64

REPLACING THE MISSING VALUES IN THE FOUNDED AND HEADQUARTER COLUMNS¶

In [15]:
Data_2021['Founded'].replace(np.NAN, value=2021, inplace=True)
In [16]:
Data_2021['HeadQuarter'].replace(np.NAN, value='Gurugram', inplace=True)

CHANGING DATATYPES¶

In [17]:
Data_2021['Founded']= Data_2021['Founded'].astype('int')
Data_2021['Amount(USD)'] = pd.to_numeric(Data_2021['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2021.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1190 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    1190 non-null   object 
 1   Founded          1190 non-null   int64  
 2   HeadQuarter      1190 non-null   object 
 3   Sector           1190 non-null   object 
 4   Investor         1129 non-null   object 
 5   Amount(USD)      1190 non-null   float64
 6   Year of funding  1190 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 74.4+ KB

REPLACING ALL NULL VALUES IN THE INVESTOR COLUMN WITH 'UNKNWON'¶

In [18]:
Data_2021['Investor'].replace(np.NAN, value= "unknown", inplace=True) 
Data_2021.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1190 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    1190 non-null   object 
 1   Founded          1190 non-null   int64  
 2   HeadQuarter      1190 non-null   object 
 3   Sector           1190 non-null   object 
 4   Investor         1190 non-null   object 
 5   Amount(USD)      1190 non-null   float64
 6   Year of funding  1190 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 74.4+ KB
In [19]:
Data_2021.describe()
Out[19]:
Founded Amount(USD) Year of funding
count 1190.000000 1.190000e+03 1190.0
mean 2016.637815 7.559443e+07 2021.0
std 4.521968 2.032155e+09 0.0
min 1963.000000 0.000000e+00 2021.0
25% 2015.000000 0.000000e+00 2021.0
50% 2018.000000 1.000000e+06 2021.0
75% 2020.000000 5.475000e+06 2021.0
max 2021.000000 7.000000e+10 2021.0

CLEANING 2020 DATAFRAME¶

In [20]:
Data_2020.head()
Out[20]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Unnamed: 9 Year of funding
0 Aqgromalin 2019 Chennai AgriTech Angel investors 200000 NaN 2020
1 Krayonnz 2019 Bangalore EdTech GSF Accelerator 100000 NaN 2020
2 PadCare Labs 2018 Pune Hygiene management Venture Center Undisclosed NaN 2020
3 NCOME 2020 New Delhi Escrow Venture Catalysts, PointOne Capital 400000 NaN 2020
4 Gramophone 2016 Indore AgriTech Siana Capital Management, Info Edge 340000 NaN 2020
In [21]:
Data_2020.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company/Brand    1052 non-null   object
 1   Founded          840 non-null    object
 2   HeadQuarter      958 non-null    object
 3   Sector           1039 non-null   object
 4   Investor         1014 non-null   object
 5   Amount(USD)      1049 non-null   object
 6   Unnamed: 9       2 non-null      object
 7   Year of funding  1052 non-null   int64 
dtypes: int64(1), object(7)
memory usage: 106.3+ KB
In [22]:
Data_2020.drop(["Unnamed: 9"],axis =1,inplace =True )
Data_2020.head()
Out[22]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Aqgromalin 2019 Chennai AgriTech Angel investors 200000 2020
1 Krayonnz 2019 Bangalore EdTech GSF Accelerator 100000 2020
2 PadCare Labs 2018 Pune Hygiene management Venture Center Undisclosed 2020
3 NCOME 2020 New Delhi Escrow Venture Catalysts, PointOne Capital 400000 2020
4 Gramophone 2016 Indore AgriTech Siana Capital Management, Info Edge 340000 2020
In [23]:
Data_2020.isnull().sum()
Out[23]:
Company/Brand        0
Founded            212
HeadQuarter         94
Sector              13
Investor            38
Amount(USD)          3
Year of funding      0
dtype: int64

REPLACING ALL NULL VALUES¶

In [24]:
Data_2020['Investor'].replace(np.NAN, value= "unknown", inplace=True) 
Data_2020['Sector'].replace(np.NAN, value= "unknown",inplace=True)
Data_2020['Founded'].fillna(0, inplace=True)
Data_2020['HeadQuarter'].replace(np.NAN, value= "unknown",inplace=True)
Data_2020.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company/Brand    1052 non-null   object
 1   Founded          1052 non-null   object
 2   HeadQuarter      1052 non-null   object
 3   Sector           1052 non-null   object
 4   Investor         1052 non-null   object
 5   Amount(USD)      1049 non-null   object
 6   Year of funding  1052 non-null   int64 
dtypes: int64(1), object(6)
memory usage: 98.0+ KB

CHANGING THE UNDISCLOSED VALUES IN AMOUNT TO 0¶

In [25]:
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2020.loc[Updated, 'Amount(USD)'] = 0
Data_2020.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company/Brand    1052 non-null   object
 1   Founded          1052 non-null   object
 2   HeadQuarter      1052 non-null   object
 3   Sector           1052 non-null   object
 4   Investor         1052 non-null   object
 5   Amount(USD)      1049 non-null   object
 6   Year of funding  1052 non-null   int64 
dtypes: int64(1), object(6)
memory usage: 98.0+ KB
In [26]:
Data_2020['Amount(USD)'] = Data_2020['Amount(USD)'].fillna(0)
Data_2020.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company/Brand    1052 non-null   object
 1   Founded          1052 non-null   object
 2   HeadQuarter      1052 non-null   object
 3   Sector           1052 non-null   object
 4   Investor         1052 non-null   object
 5   Amount(USD)      1052 non-null   object
 6   Year of funding  1052 non-null   int64 
dtypes: int64(1), object(6)
memory usage: 98.0+ KB
In [27]:
Data_2020.isnull().sum()
Out[27]:
Company/Brand      0
Founded            0
HeadQuarter        0
Sector             0
Investor           0
Amount(USD)        0
Year of funding    0
dtype: int64
In [28]:
Data_2020['Amount(USD)'] = pd.to_numeric(Data_2020['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2020['Founded'] = pd.to_numeric(Data_2020['Founded'], errors='coerce').fillna(0, downcast='infer')
Data_2020.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    1052 non-null   object 
 1   Founded          1052 non-null   int64  
 2   HeadQuarter      1052 non-null   object 
 3   Sector           1052 non-null   object 
 4   Investor         1052 non-null   object 
 5   Amount(USD)      1052 non-null   float64
 6   Year of funding  1052 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 98.0+ KB
In [29]:
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2020.loc[Updated, 'Amount(USD)'] = 0
Data_2020.info()
 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1052 entries, 0 to 1054
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    1052 non-null   object 
 1   Founded          1052 non-null   int64  
 2   HeadQuarter      1052 non-null   object 
 3   Sector           1052 non-null   object 
 4   Investor         1052 non-null   object 
 5   Amount(USD)      1052 non-null   float64
 6   Year of funding  1052 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 98.0+ KB

CLEANING 2019 DATAFRAME¶

In [30]:
Data_2019.head()
Out[30]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Bombay Shaving NaN NaN Ecommerce Sixth Sense Ventures 200000 2019
1 Ruangguru 2014.0 Mumbai Edtech General Atlantic 100000 2019
2 Eduisfun NaN Mumbai Edtech Deepak Parekh, Amitabh Bachchan, Piyush Pandey Undisclosed 2019
3 HomeLane 2014.0 Chennai Interior design Evolvence India Fund (EIF), Pidilite Group, FJ... 400000 2019
4 Nu Genes 2004.0 Telangana AgriTech Innovation in Food and Agriculture (IFA) 340000 2019
In [31]:
Data_2019.isnull().sum()
Out[31]:
Company/Brand       0
Founded            29
HeadQuarter        19
Sector              5
Investor            0
Amount(USD)         2
Year of funding     0
dtype: int64
In [32]:
Data_2019.isnull().sum()
Out[32]:
Company/Brand       0
Founded            29
HeadQuarter        19
Sector              5
Investor            0
Amount(USD)         2
Year of funding     0
dtype: int64
In [33]:
Data_2019["Sector"].mode()
Out[33]:
0    Edtech
Name: Sector, dtype: object
In [34]:
Data_2019['Sector'].fillna(Data_2019['Sector'].mode,inplace = True)
Data_2019.isnull().sum()
Out[34]:
Company/Brand       0
Founded            29
HeadQuarter        19
Sector              0
Investor            0
Amount(USD)         2
Year of funding     0
dtype: int64
In [35]:
###Changing amount datatype to int/float
Data_2019['Amount(USD)'] = pd.to_numeric(Data_2019['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2019.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    89 non-null     object 
 1   Founded          60 non-null     float64
 2   HeadQuarter      70 non-null     object 
 3   Sector           89 non-null     object 
 4   Investor         89 non-null     object 
 5   Amount(USD)      89 non-null     int64  
 6   Year of funding  89 non-null     int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 5.6+ KB
In [36]:
Data_2019.describe()
Out[36]:
Founded Amount(USD) Year of funding
count 60.000000 8.900000e+01 89.0
mean 2014.533333 2.791803e+07 2019.0
std 2.937003 1.095456e+08 0.0
min 2004.000000 0.000000e+00 2019.0
25% 2013.000000 0.000000e+00 2019.0
50% 2015.000000 7.500000e+05 2019.0
75% 2016.250000 7.000000e+06 2019.0
max 2019.000000 7.000000e+08 2019.0
In [37]:
null_data2 = Data_2019[Data_2019.isnull().any(axis=1)]
null_data2.head()
Out[37]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Bombay Shaving NaN NaN Ecommerce Sixth Sense Ventures 200000 2019
2 Eduisfun NaN Mumbai Edtech Deepak Parekh, Amitabh Bachchan, Piyush Pandey 0 2019
5 FlytBase NaN Pune Technology Undisclosed 600000 2019
6 Finly NaN Bangalore SaaS Social Capital, AngelList India, Gemba Capital... 600000 2019
8 Quantiphi NaN NaN AI & Tech Multiples Alternate Asset Management 45000000 2019
In [38]:
Data_2019['Founded'].fillna(0, inplace=True) ##replacing remaining null values with 'unknown'
Data_2019['HeadQuarter'].replace(np.NAN, value= "unknown",inplace=True)
Data_2019.info()
 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    89 non-null     object 
 1   Founded          89 non-null     float64
 2   HeadQuarter      89 non-null     object 
 3   Sector           89 non-null     object 
 4   Investor         89 non-null     object 
 5   Amount(USD)      89 non-null     int64  
 6   Year of funding  89 non-null     int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 5.6+ KB
In [39]:
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2019.loc[Updated, 'Amount(USD)'] = 0
Data_2019.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    89 non-null     object 
 1   Founded          89 non-null     float64
 2   HeadQuarter      89 non-null     object 
 3   Sector           89 non-null     object 
 4   Investor         89 non-null     object 
 5   Amount(USD)      89 non-null     int64  
 6   Year of funding  89 non-null     int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 5.6+ KB

MERGING 2019 - 2021 DATAFRAMES¶

In [40]:
Columns = [Data_2019, Data_2020, Data_2021]
Merged_Data = pd.concat(Columns)
Merged_Data
Out[40]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Bombay Shaving 0.0 unknown Ecommerce Sixth Sense Ventures 200000.0 2019
1 Ruangguru 2014.0 Mumbai Edtech General Atlantic 100000.0 2019
2 Eduisfun 0.0 Mumbai Edtech Deepak Parekh, Amitabh Bachchan, Piyush Pandey 0.0 2019
3 HomeLane 2014.0 Chennai Interior design Evolvence India Fund (EIF), Pidilite Group, FJ... 400000.0 2019
4 Nu Genes 2004.0 Telangana AgriTech Innovation in Food and Agriculture (IFA) 340000.0 2019
... ... ... ... ... ... ... ...
1204 Gigforce 2019.0 Gurugram Staffing & Recruiting Endiya Partners 0.0 2021
1205 Vahdam 2015.0 New Delhi Food & Beverages IIFL AMC 0.0 2021
1206 Leap Finance 2019.0 Bangalore Financial Services Owl Ventures 0.0 2021
1207 CollegeDekho 2015.0 Gurugram EdTech Winter Capital, ETS, Man Capital 0.0 2021
1208 WeRize 2019.0 Bangalore Financial Services 3one4 Capital, Kalaari Capital 0.0 2021

2331 rows × 7 columns

In [41]:
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2331 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    2331 non-null   object 
 1   Founded          2331 non-null   float64
 2   HeadQuarter      2331 non-null   object 
 3   Sector           2331 non-null   object 
 4   Investor         2331 non-null   object 
 5   Amount(USD)      2331 non-null   float64
 6   Year of funding  2331 non-null   int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 145.7+ KB
In [42]:
Merged_Data['Founded']= Merged_Data['Founded'].astype('int')
Merged_Data['Year of funding']= Merged_Data['Year of funding'].astype('int')
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2331 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    2331 non-null   object 
 1   Founded          2331 non-null   int64  
 2   HeadQuarter      2331 non-null   object 
 3   Sector           2331 non-null   object 
 4   Investor         2331 non-null   object 
 5   Amount(USD)      2331 non-null   float64
 6   Year of funding  2331 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 145.7+ KB

STRIPPING THE FIRST VARIABLES OF THE INVESTOR COLUMN

In [43]:
Merged_Data['Investor'] = Merged_Data['Investor'].apply(str) # To apply string formatting to the whole column
Merged_Data['Investor'] = Merged_Data['Investor'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Merged_Data['Investor'] = Merged_Data['Investor'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
In [44]:
Merged_Data.replace({'Sector':{'EdTech':'Edtech','FinTech':'Fintech', 'HealthCare': 'Healthcare', 
                                          'SaaS startup':'SaaS','HealthTech': 'Healthtech', 
                                          'Ecommerce': 'E-commerce','Food':'Foodtech', 'AI startup':'AI',
                               'AgriTech':'Agritech','Logistics & Supply Chain':'Logistics','Tech':'Tech Startup',
                             'Tech':'Tech company','IT':'Information Technology','Computer software':'Computer Software',
                              'Technology':'Tech company','Hospital & Health Care':'Healthcare','Automobile':'Automotive',
                               'Healthcare':'Healthcare','Social Media':'Media','Fashion':'Apparel & Fashion'}}, inplace = True)
In [45]:
# Replacing duplicate sector names in Sector column with a single name
Merged_Data.replace({'Sector':{'EdTech':'Edtech','FinTech':'Fintech','HealthCare':'Healthcare',
                               'SaaS startup':'SaaS','HealthTech': 'Healthtech', 'Ecommerce': 'E-commerce',
                               'Food':'Foodtech','AI startup':'AI','AgriTech':'Agritech','Logistics & Supply Chain':'Logistics',
                               'IT':'Information Technology','Automobile':'Automotive','Tech':'Tech Startup'}}, inplace = True)
Merged_Data
Out[45]:
Company/Brand Founded HeadQuarter Sector Investor Amount(USD) Year of funding
0 Bombay Shaving 0 unknown E-commerce Sixth Sense Ventures 200000.0 2019
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.0 2019
2 Eduisfun 0 Mumbai Edtech Deepak Parekh 0.0 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.0 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.0 2019
... ... ... ... ... ... ... ...
1204 Gigforce 2019 Gurugram Staffing & Recruiting Endiya Partners 0.0 2021
1205 Vahdam 2015 New Delhi Food & Beverages IIFL AMC 0.0 2021
1206 Leap Finance 2019 Bangalore Financial Services Owl Ventures 0.0 2021
1207 CollegeDekho 2015 Gurugram Edtech Winter Capital 0.0 2021
1208 WeRize 2019 Bangalore Financial Services 3one4 Capital 0.0 2021

2331 rows × 7 columns

In [46]:
Merged_Data.replace({'HeadQuarter':{'New Delhi':'Delhi'}}, inplace = True)
In [47]:
group_by_sector = Merged_Data["Sector"].value_counts()
group_by_sector.head(40)
Out[47]:
Fintech                              257
Edtech                               215
E-commerce                            96
Healthcare                            79
Agritech                              62
Healthtech                            60
Financial Services                    60
SaaS                                  58
Logistics                             51
Automotive                            46
AI                                    42
Food & Beverages                      38
Tech company                          36
Gaming                                35
Media                                 34
Information Technology & Services     34
Computer Software                     31
Foodtech                              29
Tech Startup                          25
Retail                                24
Consumer Goods                        24
E-learning                            24
Apparel & Fashion                     20
Information Technology                18
Hospitality                           18
Health, Wellness & Fitness            17
Entertainment                         16
Real Estate                           13
unknown                               13
Cosmetics                             12
IoT                                   11
Transportation                         9
Finance                                9
Insurance                              8
Deeptech                               8
Insuretech                             7
Mobility                               7
Health                                 7
IT startup                             6
Food Industry                          6
Name: Sector, dtype: int64
In [48]:
group_by_Yearoffunding = Merged_Data["Year of funding"].value_counts()
group_by_Yearoffunding
Out[48]:
2021    1190
2020    1052
2019      89
Name: Year of funding, dtype: int64
In [49]:
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2331 entries, 0 to 1208
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company/Brand    2331 non-null   object 
 1   Founded          2331 non-null   int64  
 2   HeadQuarter      2331 non-null   object 
 3   Sector           2331 non-null   object 
 4   Investor         2331 non-null   object 
 5   Amount(USD)      2331 non-null   float64
 6   Year of funding  2331 non-null   int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 145.7+ KB

PLOTTING A HORIZONTAL BAR CHART¶

FOR QUESTION 1:What sectors access most funds as startups in India?¶

In [50]:
pd.set_option('display.float_format', lambda x: '%.2f' % x)
In [51]:
Merged_Data.rename(columns = {'Amount(USD)':'Amount'}, inplace = True)
In [52]:
Amount_by_sectors = Merged_Data.groupby(by = "Sector").Amount.agg(["sum","count"]).sort_values(by = ["sum"], 
                                                                                       ascending = False)
Amount_by_sectors
Out[52]:
sum count
Sector
Retail 70477743000.00 24
Information Technology & Services 70210150000.00 34
Edtech 5712520230.00 215
Fintech 4598733709.60 257
Tech company 3409583900.00 36
... ... ...
Manchester, Greater Manchester 0.00 1
Digital mortgage 0.00 1
MarTech 0.00 1
SaaS  startup 0.00 1
Food Startup 0.00 1

486 rows × 2 columns

In [102]:
plt.figure(figsize = (20,15))
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
sns.barplot(y = Amount_by_sectors[:10].index, x = (Amount_by_sectors["sum"])[:10])
plt.ylabel("SECTORS",fontsize = 40,fontweight = 'bold')
plt.xlabel("Total Funding Received",fontsize = 25,fontweight = 'bold')
plt.title("FUNDING RECEIVED BY SECTORS",fontsize = 40,fontweight = 'bold')
Out[102]:
Text(0.5, 1.0, 'FUNDING RECEIVED BY SECTORS')

PLOTTING A LINE PLOT¶

FOR QUESTION 2: How has funding of startups changed overtime?

SORTING 0 FROM FOUNDED COLUMN

In [54]:
Sample1 = Merged_Data[Merged_Data['Amount'] != 0].sort_values('Amount', ascending=False)
Sample1
Out[54]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
280 Keka HR 2014 Hyderabad Information Technology & Services Recur Club 70000000000.00 2021
280 Reliance Retail Ventures Ltd 2006 Mumbai Retail Silver Lake 70000000000.00 2020
317 Infra.Market 2016 Thane Construction InnoVen Capital 3000000000.00 2021
317 Snowflake 2012 California Tech company Salesforce Ventures 3000000000.00 2020
328 WizKlub 2018 Bangalore Edtech Incubate Fund India 2200000000.00 2021
... ... ... ... ... ... ... ...
1000 VLCC Health Care 1989 Gurugram Health, Wellness & Fitness unknown 12700.00 2021
834 Get My Parking 2015 Delhi Mobility IvyCap Ventures 42.23 2021
834 Peel Works 2010 Mumbai SaaS CESC Ventures 42.23 2020
552 SATYA Microcapital 1995 Delhi Fintech BlueOrchard Finance Limited 9.60 2020
552 Indi Energy 2019 Roorkee Renewables & Environment Mumbai Angels Network 9.60 2021

1657 rows × 7 columns

In [107]:
c = list(Sample1.groupby(Sample1['Year of funding']).sum()['Amount'])
d = list(Sample1['Year of funding'].value_counts().index.sort_values())
sns.scatterplot(d,c)
plt.plot(d,c)
plt.xlabel('YEARS FUNDING WERE RECEIVED', fontsize = 22, fontweight = 'bold')
plt.ylabel('Amount Received in (USD) ', fontsize = 22, fontweight = 'bold')
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.title('FUNDING RECEIVED BY STARTUPS FROM 2019 to 2021', fontsize = 22, fontweight = 'bold')
plt.rcParams['figure.figsize'] = (20,10)
warnings.filterwarnings('ignore')

BAR GRAPH¶

QUESTTION 3 ; Does the location of the startup influence the funding it receives?¶

To get this bar chart a new dataframe was created called the Sample2 by way of getting rid of unkown values

In [56]:
Sample2 = Merged_Data[Merged_Data['HeadQuarter'] != 'unknown']
In [58]:
Merged_Data.head()
Out[58]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
0 Bombay Shaving 0 unknown E-commerce Sixth Sense Ventures 200000.00 2019
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
2 Eduisfun 0 Mumbai Edtech Deepak Parekh 0.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
In [57]:
group_by_location = Sample2["HeadQuarter"].value_counts()
group_by_location
Out[57]:
Bangalore                   758
Mumbai                      374
Delhi                       251
Gurugram                    239
Chennai                      87
                           ... 
Ludhiana                      1
Rajastan                      1
Jiaxing, Zhejiang, China      1
Shanghai, China               1
Gandhinagar                   1
Name: HeadQuarter, Length: 122, dtype: int64
In [60]:
Amount_by_location = Sample2 .groupby(by = "HeadQuarter").Amount.agg(["sum"]).sort_values(by = ["sum"], 
                                                                                       ascending = False)
Amount_by_location
Out[60]:
sum
HeadQuarter
Mumbai 77928437618.23
Hyderabad 70264697000.00
Bangalore 12930857158.00
Gurugram 3454999700.00
Delhi 3273139879.83
... ...
Bengaluru 0.00
Palmwoods, Queensland, Australia 0.00
Bhopal 0.00
San Franciscao 0.00
Guwahati 0.00

122 rows × 1 columns

Getting Top 10 of the Amount by Location Dataframe

In [61]:
Amount_by_location_top = Amount_by_location.iloc[:10]
Amount_by_location_top 
Out[61]:
sum
HeadQuarter
Mumbai 77928437618.23
Hyderabad 70264697000.00
Bangalore 12930857158.00
Gurugram 3454999700.00
Delhi 3273139879.83
Thane 3081415000.00
California 3078300000.00
Pune 1165935000.00
Noida 1035734300.00
Haryana 851549800.00
In [65]:
Amount_by_location_top .plot(kind='bar', title='FUNDING ASSOCIATED WITH THEIR LOCATIONS', ylabel='Amount Received',
         xlabel='HEADQUATERS', figsize=(25,6))
            
Out[65]:
<AxesSubplot:title={'center':'FUNDING ASSOCIATED WITH THEIR LOCATIONS'}, xlabel='HEADQUATERS', ylabel='Amount Received'>
In [ ]:
x = Merged_Data['Amount']
y = group_by_location
plt.bar(x,y)
plt.rcParams['figure.figsize'] = (20,10)
plt.title('FUNDING ASSOCIATED WITH THEIR LOCATIONS', fontsize = 19)
plt.xlabel('AMOUNT RECEIVED', fontsize = 19,)
plt.ylabel('HEADQUARTERS', fontsize = 19)
In [ ]:
fig = plt.figure(figsize = (14, 8))
plt.scatter(Sample1['Founded'], Merged_Data["Year of funding"], 
            s=Sample1["Amount"]* 0.001, alpha=0.5)
plt.show()
In [ ]:
ax = Merged_Data.plot(kind ='scatter',x = 'Founded',
                       y = 'group_by_Year of funding', figsize =(10,5),alpha =0.5,
                       color = 'yellow', s = Merged_Data["Amount" ]* 1000 + 10)

ax.set_ylabel("")
ax.set_title("HOW FUNDING HAS CHANGED OVERTIME")
ax.legend(['Amount'],loc = 'upper right', fontsize = 'x-large') 

Question 4.Who are the highest investors in the various sectors?¶

In [67]:
Sample3 = Merged_Data[Merged_Data['Investor'] != 'unknown']
Sample3 
Out[67]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
0 Bombay Shaving 0 unknown E-commerce Sixth Sense Ventures 200000.00 2019
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
2 Eduisfun 0 Mumbai Edtech Deepak Parekh 0.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
... ... ... ... ... ... ... ...
1204 Gigforce 2019 Gurugram Staffing & Recruiting Endiya Partners 0.00 2021
1205 Vahdam 2015 Delhi Food & Beverages IIFL AMC 0.00 2021
1206 Leap Finance 2019 Bangalore Financial Services Owl Ventures 0.00 2021
1207 CollegeDekho 2015 Gurugram Edtech Winter Capital 0.00 2021
1208 WeRize 2019 Bangalore Financial Services 3one4 Capital 0.00 2021

2232 rows × 7 columns

In [68]:
Sample4 = Sample3 [Sample3 ['Amount'] != 0]
Sample4 
Out[68]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
0 Bombay Shaving 0 unknown E-commerce Sixth Sense Ventures 200000.00 2019
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
5 FlytBase 0 Pune Tech company Undisclosed 600000.00 2019
... ... ... ... ... ... ... ...
1050 Zenduty 2019 Bangalore Computer Software StartupXseed Ventures 1500000.00 2021
1051 R for Rabbit 2014 Ahmedabad Consumer Goods Xponentia Capital Partners 13200000.00 2021
1052 Acko 2016 Bangalore Insurance General Atlantic 8000000.00 2021
1053 LoveLocal 2015 Mumbai Retail Vulcan Capital 8043000.00 2021
1054 SupplyNote 2015 Noida Food & Beverages Venture Catalysts 9000000.00 2021

1589 rows × 7 columns

In [69]:
Sample5 = Sample4 [Sample4 ['Founded'] != 0]
Sample5 
Out[69]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
9 Lenskart 2010 Delhi E-commerce SoftBank 1000000.00 2019
10 Cub McPaws 2010 Mumbai E-commerce & AR Venture Catalysts 2000000.00 2019
... ... ... ... ... ... ... ...
1050 Zenduty 2019 Bangalore Computer Software StartupXseed Ventures 1500000.00 2021
1051 R for Rabbit 2014 Ahmedabad Consumer Goods Xponentia Capital Partners 13200000.00 2021
1052 Acko 2016 Bangalore Insurance General Atlantic 8000000.00 2021
1053 LoveLocal 2015 Mumbai Retail Vulcan Capital 8043000.00 2021
1054 SupplyNote 2015 Noida Food & Beverages Venture Catalysts 9000000.00 2021

1424 rows × 7 columns

In [70]:
Sample6 = Sample5 [Sample5 ['HeadQuarter'] != 'unknown']
Sample6 
Out[70]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
9 Lenskart 2010 Delhi E-commerce SoftBank 1000000.00 2019
10 Cub McPaws 2010 Mumbai E-commerce & AR Venture Catalysts 2000000.00 2019
... ... ... ... ... ... ... ...
1050 Zenduty 2019 Bangalore Computer Software StartupXseed Ventures 1500000.00 2021
1051 R for Rabbit 2014 Ahmedabad Consumer Goods Xponentia Capital Partners 13200000.00 2021
1052 Acko 2016 Bangalore Insurance General Atlantic 8000000.00 2021
1053 LoveLocal 2015 Mumbai Retail Vulcan Capital 8043000.00 2021
1054 SupplyNote 2015 Noida Food & Beverages Venture Catalysts 9000000.00 2021

1367 rows × 7 columns

In [112]:
fig1 = go.Figure(
    data=go.Pie(values=Sample6 ['Investor'].value_counts()[:10].values,labels=
                Sample6 ['Investor'].value_counts()[:8].index,title='TOP INVESTORS IN THE INDIAN ECOSYTEM'))
fig1.show()

PLOTTING A PIE CHART¶

Question5:Do Indian Startups receive funds from foreign investors and which sectors receive most of these funds?

In [74]:
Amount_by_Investors_top
Out[74]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019
9 Lenskart 2010 Delhi E-commerce SoftBank 1000000.00 2019
10 Cub McPaws 2010 Mumbai E-commerce & AR Venture Catalysts 2000000.00 2019
13 JobSquare 2019 Ahmedabad HR tech Titan Capital 1200000.00 2019
15 LivFin 2017 Delhi Fintech German development finance institution DEG 660000000.00 2019
17 Zest Money 2015 Bangalore Fintech Goldman Sachs. 7500000.00 2019
19 Azah Personal Care Pvt. Ltd. 2018 Gurugram Health Kunal Bahl 1000000.00 2019
23 DROR Labs Pvt. Ltd 2018 Delhi Safety tech Inflection Point Ventures 500000.00 2019
In [75]:
# create new column using ditionary mapping
Amount_by_Investors_top["Investortypes"] = Amount_by_Investors_top['Investor'].map(
                           {'General Atlantic': 'Foreign Investor',' Evolvence India Fund (EIF), Pidilite Group,FJ': 'Domestic Investor',
                            'Venture Catalysts':'Domestic Investor','Innovation in Food and Agriculture (IFA)':'Domestic Investor',
                            'SoftBank':'Foreign Investor','Titan Capital':'Domestic Investor','German development finance institution DEG':
                            'Foreign Investor','Goldman Sachs.':'Foreign Investor','Kunal Bahl':'Domestic Investor',
                           'Inflection Point Ventures':'Domestic Investor','Evolvence India Fund (EIF)':'Domestic Investor'})
# display the dataframe
Amount_by_Investors_top
Out[75]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding Investortypes
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019 Foreign Investor
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019 Domestic Investor
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019 Domestic Investor
9 Lenskart 2010 Delhi E-commerce SoftBank 1000000.00 2019 Foreign Investor
10 Cub McPaws 2010 Mumbai E-commerce & AR Venture Catalysts 2000000.00 2019 Domestic Investor
13 JobSquare 2019 Ahmedabad HR tech Titan Capital 1200000.00 2019 Domestic Investor
15 LivFin 2017 Delhi Fintech German development finance institution DEG 660000000.00 2019 Foreign Investor
17 Zest Money 2015 Bangalore Fintech Goldman Sachs. 7500000.00 2019 Foreign Investor
19 Azah Personal Care Pvt. Ltd. 2018 Gurugram Health Kunal Bahl 1000000.00 2019 Domestic Investor
23 DROR Labs Pvt. Ltd 2018 Delhi Safety tech Inflection Point Ventures 500000.00 2019 Domestic Investor
In [76]:
Amount_by_Investors = Amount_by_Investors_top.groupby(by = 'Investortypes').Amount.agg(
    ["sum"]).sort_values(by = ["sum"], ascending = False)
Amount_by_Investors
Out[76]:
sum
Investortypes
Foreign Investor 668600000.00
Domestic Investor 5440000.00
In [77]:
g1 = go.Figure(
    data=go.Pie(values=Amount_by_Investors_top ['Investortypes'].value_counts().values,
                labels=Amount_by_Investors_top ['Investortypes'].value_counts().index,
                title='FOREIGN TO DOMESTIC INVESTMENT IN INDIA'))
g1.show()

To answer the second part of the question, the Amount_by_Investor column with only foreign¶

investors is selected and put in a new dataframe to be able do our analysis

In [78]:
Amount_by_Investors_top .plot(kind='bar', title='FUNDING ASSOCIATED WITH THEIR LOCATIONS', ylabel='Amount',
         xlabel='Sector', figsize=(25, 10))
Out[78]:
<AxesSubplot:title={'center':'FUNDING ASSOCIATED WITH THEIR LOCATIONS'}, xlabel='Sector', ylabel='Amount'>
In [79]:
Amount_by_Sector = Amount_by_Investors_top.groupby(by = 'Sector').Amount.agg(
    ["sum"]).sort_values(by = ["sum"], ascending = False)
Amount_by_Sector
Out[79]:
sum
Sector
Fintech 667500000.00
E-commerce & AR 2000000.00
HR tech 1200000.00
E-commerce 1000000.00
Health 1000000.00
Safety tech 500000.00
Interior design 400000.00
Agritech 340000.00
Edtech 100000.00
In [80]:
sns.barplot(x = Amount_by_Sector.index, y = Amount_by_Sector["sum"])
plt.title("SECTORS MOST FUNDED BY FOREIGN INVESTORS")
plt.ylabel("  ")
plt.xlabel("Total Amount of Funding per Year")
Out[80]:
Text(0.5, 0, 'Total Amount of Funding per Year')

BAR PLOT REPRESENTATION OF DATA

FUNDING WITHOUT FINTECH WHICH IS CONSISDERED AS EXTREME OUTLIER

In [83]:
Amount_by_Investors_top.drop([15,17], axis=0, inplace=True)
In [84]:
Amount_by_Investors_top
Out[84]:
Company/Brand Founded HeadQuarter Sector Investor Amount Year of funding Investortypes
1 Ruangguru 2014 Mumbai Edtech General Atlantic 100000.00 2019 Foreign Investor
3 HomeLane 2014 Chennai Interior design Evolvence India Fund (EIF) 400000.00 2019 Domestic Investor
4 Nu Genes 2004 Telangana Agritech Innovation in Food and Agriculture (IFA) 340000.00 2019 Domestic Investor
9 Lenskart 2010 Delhi E-commerce SoftBank 1000000.00 2019 Foreign Investor
10 Cub McPaws 2010 Mumbai E-commerce & AR Venture Catalysts 2000000.00 2019 Domestic Investor
13 JobSquare 2019 Ahmedabad HR tech Titan Capital 1200000.00 2019 Domestic Investor
19 Azah Personal Care Pvt. Ltd. 2018 Gurugram Health Kunal Bahl 1000000.00 2019 Domestic Investor
23 DROR Labs Pvt. Ltd 2018 Delhi Safety tech Inflection Point Ventures 500000.00 2019 Domestic Investor
In [85]:
Amount_by_Investors_top .plot(kind='bar', title='SECTORS MOST FUNDED BY FOREIGN INVESTORS', 
                              ylabel='Amount',
         xlabel='Sector', figsize=(25, 10))
Out[85]:
<AxesSubplot:title={'center':'SECTORS MOST FUNDED BY FOREIGN INVESTORS'}, xlabel='Sector', ylabel='Amount'>

Fintech is most favored by Favored by Foreign Investors

CLEANING 2018 DATAFRAME¶

In [86]:
Data_2018.head()
Out[86]:
Company Name Industry Round/Series Amount Location About Company Year of funding
0 TheCollegeFever Brand Marketing, Event Promotion, Marketing, S... Seed 250000 Bangalore, Karnataka, India TheCollegeFever is a hub for fun, fiesta and f... 2018
1 Happy Cow Dairy Agriculture, Farming Seed ₹40,000,000 Mumbai, Maharashtra, India A startup which aggregates milk from dairy far... 2018
2 MyLoanCare Credit, Financial Services, Lending, Marketplace Series A ₹65,000,000 Gurgaon, Haryana, India Leading Online Loans Marketplace in India 2018
3 PayMe India Financial Services, FinTech Angel 2000000 Noida, Uttar Pradesh, India PayMe India is an innovative FinTech organizat... 2018
4 Eunimart E-Commerce Platforms, Retail, SaaS Seed — Hyderabad, Andhra Pradesh, India Eunimart is a one stop solution for merchants ... 2018
In [87]:
Data_2018.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 525 entries, 0 to 525
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Company Name     525 non-null    object
 1   Industry         525 non-null    object
 2   Round/Series     525 non-null    object
 3   Amount           525 non-null    object
 4   Location         525 non-null    object
 5   About Company    525 non-null    object
 6   Year of funding  525 non-null    int64 
dtypes: int64(1), object(6)
memory usage: 32.8+ KB
In [88]:
Data_2018.rename(columns = {'Location':'Headquarter','Industry': 'Sector'},inplace = True)
Data_2018
Out[88]:
Company Name Sector Round/Series Amount Headquarter About Company Year of funding
0 TheCollegeFever Brand Marketing, Event Promotion, Marketing, S... Seed 250000 Bangalore, Karnataka, India TheCollegeFever is a hub for fun, fiesta and f... 2018
1 Happy Cow Dairy Agriculture, Farming Seed ₹40,000,000 Mumbai, Maharashtra, India A startup which aggregates milk from dairy far... 2018
2 MyLoanCare Credit, Financial Services, Lending, Marketplace Series A ₹65,000,000 Gurgaon, Haryana, India Leading Online Loans Marketplace in India 2018
3 PayMe India Financial Services, FinTech Angel 2000000 Noida, Uttar Pradesh, India PayMe India is an innovative FinTech organizat... 2018
4 Eunimart E-Commerce Platforms, Retail, SaaS Seed — Hyderabad, Andhra Pradesh, India Eunimart is a one stop solution for merchants ... 2018
... ... ... ... ... ... ... ...
521 Udaan B2B, Business Development, Internet, Marketplace Series C 225000000 Bangalore, Karnataka, India Udaan is a B2B trade platform, designed specif... 2018
522 Happyeasygo Group Tourism, Travel Series A — Haryana, Haryana, India HappyEasyGo is an online travel domain. 2018
523 Mombay Food and Beverage, Food Delivery, Internet Seed 7500 Mumbai, Maharashtra, India Mombay is a unique opportunity for housewives ... 2018
524 Droni Tech Information Technology Seed ₹35,000,000 Mumbai, Maharashtra, India Droni Tech manufacture UAVs and develop softwa... 2018
525 Netmeds Biotechnology, Health Care, Pharmaceutical Series C 35000000 Chennai, Tamil Nadu, India Welcome to India's most convenient pharmacy! 2018

525 rows × 7 columns

In [89]:
Data_2018.drop(['About Company','Round/Series'],axis=1, inplace = True)
Data_2018.head()
Out[89]:
Company Name Sector Amount Headquarter Year of funding
0 TheCollegeFever Brand Marketing, Event Promotion, Marketing, S... 250000 Bangalore, Karnataka, India 2018
1 Happy Cow Dairy Agriculture, Farming ₹40,000,000 Mumbai, Maharashtra, India 2018
2 MyLoanCare Credit, Financial Services, Lending, Marketplace ₹65,000,000 Gurgaon, Haryana, India 2018
3 PayMe India Financial Services, FinTech 2000000 Noida, Uttar Pradesh, India 2018
4 Eunimart E-Commerce Platforms, Retail, SaaS — Hyderabad, Andhra Pradesh, India 2018
In [90]:
Data_2018.isnull().sum()
Out[90]:
Company Name       0
Sector             0
Amount             0
Headquarter        0
Year of funding    0
dtype: int64
In [91]:
Data_2018['Sector'] = Data_2018['Sector'].str.replace('-', '', n=1)
In [92]:
Data_2018
Out[92]:
Company Name Sector Amount Headquarter Year of funding
0 TheCollegeFever Brand Marketing, Event Promotion, Marketing, S... 250000 Bangalore, Karnataka, India 2018
1 Happy Cow Dairy Agriculture, Farming ₹40,000,000 Mumbai, Maharashtra, India 2018
2 MyLoanCare Credit, Financial Services, Lending, Marketplace ₹65,000,000 Gurgaon, Haryana, India 2018
3 PayMe India Financial Services, FinTech 2000000 Noida, Uttar Pradesh, India 2018
4 Eunimart ECommerce Platforms, Retail, SaaS — Hyderabad, Andhra Pradesh, India 2018
... ... ... ... ... ...
521 Udaan B2B, Business Development, Internet, Marketplace 225000000 Bangalore, Karnataka, India 2018
522 Happyeasygo Group Tourism, Travel — Haryana, Haryana, India 2018
523 Mombay Food and Beverage, Food Delivery, Internet 7500 Mumbai, Maharashtra, India 2018
524 Droni Tech Information Technology ₹35,000,000 Mumbai, Maharashtra, India 2018
525 Netmeds Biotechnology, Health Care, Pharmaceutical 35000000 Chennai, Tamil Nadu, India 2018

525 rows × 5 columns

In [113]:
Data_2018['Sector'] =Data_2018['Sector'].apply(str) # To apply string formatting to the whole column
Data_2018['Sector'] =Data_2018['Sector'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Data_2018['Sector'] = Data_2018['Sector'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
Data_2018.head()
Out[113]:
Company Name Sector Amount Headquarter Year of funding
0 TheCollegeFever Brand Marketing 250000 Bangalore, Karnataka, India 2018
1 Happy Cow Dairy Agriculture ₹40,000,000 Mumbai, Maharashtra, India 2018
2 MyLoanCare Credit ₹65,000,000 Gurgaon, Haryana, India 2018
3 PayMe India Financial Services 2000000 Noida, Uttar Pradesh, India 2018
4 Eunimart ECommerce Platforms — Hyderabad, Andhra Pradesh, India 2018
In [114]:
Data_2018['Headquarter'] =Data_2018['Headquarter'].apply(str) # To apply string formatting to the whole column
Data_2018['Headquarter'] =Data_2018['Headquarter'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Data_2018['Headquarter'] = Data_2018['Headquarter'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
Data_2018.head()
Out[114]:
Company Name Sector Amount Headquarter Year of funding
0 TheCollegeFever Brand Marketing 250000 Bangalore 2018
1 Happy Cow Dairy Agriculture ₹40,000,000 Mumbai 2018
2 MyLoanCare Credit ₹65,000,000 Gurgaon 2018
3 PayMe India Financial Services 2000000 Noida 2018
4 Eunimart ECommerce Platforms — Hyderabad 2018

Cleaning the Amounts column¶

In [117]:
# Cleaning the Amounts column
## Removing the commas and dashes from the Amounts
Data_2018['Amount'] = Data_2018['Amount'].apply(str)
Data_2018['Amount'].replace(",", "", inplace = True, regex=True)
Data_2018['Amount'].replace("—", 0, inplace = True, regex=True)
Data_2018['Amount'].replace("$", "", inplace = True, regex=True)
In [124]:
## Creating temporary columns to help with the conversion of INR to USD
Data_2018['INR Amount'] = Data_2018['Amount'].str.rsplit('₹', n = 2).str[1]
Data_2018['INR Amount'] = Data_2018['INR Amount'].apply(float).fillna(0)
Data_2018['INR Amount'] = Data_2018['INR Amount'].fillna(0)
Data_2018['USD Amount'] = Data_2018['INR Amount'] * 0.0146
Data_2018['USD Amount'] = Data_2018['USD Amount'].replace(0, np.nan)
Data_2018['USD Amount'] = Data_2018['USD Amount'].fillna(Data_2018['Amount'])
Data_2018['USD Amount'] = Data_2018['USD Amount'].replace("$", "", regex=True)
Data_2018['Amount'] = Data_2018['USD Amount']
Data_2018["Amount"] = Data_2018["Amount"].apply(lambda x: float(str(x).replace("$","")))
Data_2018["Amount"] = Data_2018["Amount"].replace(0, np.nan)
In [128]:
Data_2018.head()
Out[128]:
Company Name Sector Amount Headquarter Year of funding
0 TheCollegeFever Brand Marketing 250000.00 Bangalore 2018
1 Happy Cow Dairy Agriculture 584000.00 Mumbai 2018
2 MyLoanCare Credit 949000.00 Gurgaon 2018
3 PayMe India Financial Services 2000000.00 Noida 2018
4 Eunimart ECommerce Platforms NaN Hyderabad 2018
In [129]:
Data_2018.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 525 entries, 0 to 525
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Company Name     525 non-null    object 
 1   Sector           525 non-null    object 
 2   Amount           377 non-null    float64
 3   Headquarter      525 non-null    object 
 4   Year of funding  525 non-null    int64  
dtypes: float64(1), int64(1), object(3)
memory usage: 24.6+ KB
In [132]:
Data_2018.isnull().sum()
Out[132]:
Company Name         0
Sector               0
Amount             148
Headquarter          0
Year of funding      0
dtype: int64
In [134]:
Data_2018['Amount'] = Data_2018['Amount'].fillna(0)
In [135]:
Data_2018.isnull().sum()
Out[135]:
Company Name       0
Sector             0
Amount             0
Headquarter        0
Year of funding    0
dtype: int64